Data Quality Mining: Employing Classifiers for Assuring Consistent Datasets
نویسنده
چکیده
Independent from the concrete definition of the term “data quality” consistency always plays a major role. There are two main points when dealing with the data quality of a database: Firstly, the data quality has to be measured, and secondly, if is necessary, it must be improved. A classifier can be used for both purposes regarding consistency demands by calculating the distance of the classified value to the stored value for measuring and using the classified value for correction.
منابع مشابه
Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملDocument Type Classification in Online Digital Libraries
Online digital libraries make it easier for researchers to search for scientific information. They have been proven as powerful resources in many data mining, machine learning and information retrieval applications that require high-quality data. The quality of the data highly depends on the accuracy of classifiers that identify the types of documents that are crawled from the Web, e.g., as res...
متن کاملEvaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کاملA Novel Ensemble Classifier based Classification on Large Datasets with Hybrid Feature Selection Approach
Exploring and analyzing large datasets has become an active research area in the field of data mining in the last two decades. There had been several approaches available in the literature to investigate the large datasets that comprise of millions of data. The most important data mining approaches involved in this task are preprocessing, feature selection and classification. All the three appr...
متن کامل